Psychometric Performance of a Substance Use Symptom Checklist to Help Clinicians Assess Substance Use Disorder in Primary Care

Key Points Question What are the psychometric properties of a Substance Use Symptom Checklist used routinely in primary care among patients reporting high-risk cannabis and/or other drug use? Findings In this cross-sectional study of 23 304 positive screens for daily cannabis or any other drug use, the 11-item checklist provided scaled, unidimensional information on the presence and severity of substance use disorder. The checklist performed well across patient age, sex, race, and ethnicity. Meaning The findings of this study support the use of the checklist in primary care as a tool to aid clinicians in eliciting patient symptoms, identifying a spectrum of substance use disorder severity, and clinical decision-making based on diagnostic criteria.


eFigure 1. Routine Behavioral Health Screen With Single Items for Cannabis and Other Drugs
Caption: The annual Behavioral Health Questionnaire includes single items for cannabis (#6) and any other drug use (#7). The questionnaire is prefaced with "Once a year, we ask all our patients to complete this form on conditions that affect their health. Please help us provide you with the best medical care by answering the questions below.

eAppendix 1. Detailed Description of Item Characteristics
Daily cannabis only All items had high discrimination parameters, 1 ranging from 1.42 (tolerance) to 2.84 (neglect roles), demonstrating a strong association with SUD severity. Severity parameter ranged from 1.19 (physical/psychological problems) to 2.40 (hazardous use). Items with lower severity parameters (e.g., tolerance, physical/psychological problems, craving) discriminated best when latent SUD was mild whereas items with higher severity parameters (e.g., time spent, neglect roles, and hazardous use) discriminated best when latent SUD was severe. See Table 3 for item parameters and Figure 2 for item characteristic curves among patients who reported daily cannabis use only.
Other drug use only All items had extremely high discrimination parameters, 1 ranging from 2.79 (tolerance) to 5.72 (time spent) and severity parameters ranging from 0.74 (physical/psychological problems) to 1.37 (hazardous use). One item (physical/psychological problems) discriminated best when latent SUD was mild, and two items discriminated best when latent SUD was severe (time spent, hazardous use) See Table 3 for item parameters and Figure 2 for item characteristic curves among patients who reported other drug use only.
Daily cannabis and other drug use All items had high discrimination parameters, 1 ranging from 1.55 (tolerance) to 3.61 (neglect roles) and severity parameters ranging from 0.31 (tolerance) to 1.13 (hazardous use). As with the cannabis-only subsample, some items (tolerance, physical/psychological problems, craving) discriminated best when latent SUD was mild whereas other items discriminated best when latent SUD was severe (withdrawal, time spent, neglect roles, hazardous use, and activities given up). See Table 3 for item parameters and Figure 2 item characteristic curves among patients who reported both daily cannabis and other drug use.

eAppendix 2. Detailed Description of Differential Item Functioning (DIF) Analyses
For each subsample (patients who reported daily cannabis only, other drug use only, both daily cannabis and other drug use), we tested for differential item functioning (DIF) by demographic factors and then examined the impact DIF had on the clinical utility of the Substance Use Symptom Checklist, as previously done for psychometric evaluation of an Alcohol Symptom Checklist. 2 Specifically, we tested whether item-level severity and discrimination parameters differed by age, sex, race, and ethnicity using a likelihood ratio test that compared a more complex model where item parameters were estimated separately (i.e., freely estimated) for each demographic subgroup to a simpler model that assumes item parameters are the same for subgroups. 3,4 In the freely estimated model, discrimination and severity parameters were freely estimated for all items, with the exception of three "anchor" items for which we constrained parameters to be equal across demographic subgroups so that differences in the latent means and variances could be estimated between groups (e.g., male and female patients) without biasing DIF tests. 4,5 We selected the three most consistently discriminating items 4 across the three subsamples to be anchor items: time spent, neglect roles, and activities given up. The most populous subgroup in each demographic category was selected as the reference group with latent means set at 0 and latent variances set at 1. Latent means and variances were freely estimated in other subgroups. We used an alpha level of 0.05/11 items to account for multiple comparisons. DIF results for each demographic subgroup within each subsample are presented in Supplemental Tables 1-8. DIF may be present without having clinically meaningful impact on the performance of the Substance Use Symptom Checklist. For example, DIF may be present in opposite directions for different items, effectively canceling out. 3 Additionally, DIF may be present in small amounts but still statistically significant due to a large sample size. 6 Because the total number of symptoms (i.e., DSM-5 criteria) endorsed on the Substance Use Symptom Checklist is used by clinicians to determine the presence and severity of SUD, it is useful to examine the impact of DIF on the total expected number (0-11) of criteria endorsed. Within each subsample, we used the IRT model that freely estimated item parameters to calculate: 1) the maximum difference between subgroups at any point along the severity continuum, and 2) the maximum difference between subgroups at mild (2-3 symptoms), moderate (4-5 symptoms), and severe (≥6 symptoms) thresholds of SUD, 7 which could affect clinical decision-making regarding diagnosis and treatment of SUD. Differences are summarized in the main paper, presented in Table 9, and graphically illustrated in  Lastly, we compared freely estimated models with correction for DIF and constrained models without correction for DIF to determine whether any item-level DIF led to meaningful differences. A difference in comparative fit indices (CFI) value >0.01 has been proposed 8 as another method for determining if there is meaningful DIF on the absolute model fit of the factor analysis. Differences in fit indices are presented in Table 10.

eAppendix 3. DIF Findings for Patients Who Reported Daily Cannabis Use Only
Among the subsample who reported daily cannabis uses only, there was significant DIF associated with age (6 items), sex (3 items), and race (2 items), but not Hispanic ethnicity.

By age
Six items (tolerance, withdrawal, physical/psychological problems, hazardous use, social/interpersonal problems, and craving) had significant differential item functioning by age for both the discrimination and severity parameters.
In addition to DIF, there were differences in latent means and variances between some age groups. Patients age 18-24 had, on average, higher SUD severity (latent mean >0) than patients 25-44, whereas patients 45-64 and 65+ had, on average, lower SUD severity (latent mean <0) than patients 25-44. For all age groups, latent SUD was less variable (latent variance <1) than patients age 25-44.  Note. * = reference group, † = anchor item, a = discrimination parameter estimate, b = severity parameter estimate. Item parameters that significantly differed from the reference group are presented in the table; item parameters that did not significantly differ from the reference group or that were fixed as anchoring items are indicated with dashes (-). Latent means and variances were fixed to 0 and 1, respectively, for the reference group and were freely estimated for non-reference groups.

By sex
Three items (larger/longer, hazardous use, craving) had significant differential item functioning by sex for both discrimination and severity parameters.
In addition to DIF, female patients had lower and more variable SUD severity (latent mean <0 and latent variance >1), on average, than male patients. Note. * = reference group, † = anchor item, a = discrimination parameter estimate, b = severity parameter estimate. Item parameters that significantly differed from the reference group are presented in the table; item parameters that did not significantly differ from the reference group or that were fixed as anchoring items are indicated with dashes (-). Latent means and variances were fixed to 0 and 1, respectively, for the reference group and were freely estimated for non-reference groups.

By race
One item (quit/control) had significant DIF by race for the discrimination parameter and two items (tolerance, quit/control) had significant DIF for the severity parameter. Analyses may have been underpowered to detect DIF given small numbers for some races (see Table 1).
In addition to DIF, American Indian/Alaska Native, Asian, Black/African American, and Native Hawaiian/Pacific Islander patients had, on average, higher and less variable SUD severity (latent mean >0; latent variance <1) than White patients.
Latent mean 0.00 0.17 0.33 0.26 0.11 Latent variance 1.00 0.77 0.95 0.78 0.79 Note. * = reference group, † = anchor item, a = discrimination parameter estimate, b = severity parameter estimate. Item parameters that significantly differed from the reference group are presented in the table; item parameters that did not significantly differ from the reference group or that were fixed as anchoring items are indicated with dashes (-). Latent means and variances were fixed to 0 and 1, respectively, for the reference group and were freely estimated for non-reference groups.

By ethnicity
There was no DIF by ethnicity although analyses may have been underpowered due to small numbers of some subgroups to detect any differences.

eAppendix 4. DIF Findings for Patients Who Reported Other Drug Use Only
Among the subsample who reported other drug use only, there was significant DIF associated with age (2 items), sex (1 item), and ethnicity (1 items), but not race.

By age
Two items (tolerance and quit/control) had significant DIF by age for both discrimination and severity parameters.
In addition to DIF, patients 18-24 and 45-64 had higher SUD severity (latent mean >0) but severity was less variable (latent variance <1) than patients 25-44. Patients 65 and over had lower (latent mean <0) and less variable (latent variance <1) SUD severity. Time spent † 6.36 1.08 Hazardous Note. * = reference group, † = anchor item, a = discrimination parameter estimate, b = severity parameter estimate. Item parameters that significantly differed from the reference group are presented in the table; item parameters that did not significantly differ from the reference group or that were fixed as anchoring items are indicated with dashes (-). Latent means and variances were fixed to 0 and 1, respectively, for the reference group and were freely estimated for non-reference groups.

By sex
One item (hazardous use) had significant DIF by sex for both discrimination and severity parameters.
In addition to DIF, female patients had slightly lower (latent mean <0) and more variable SUD severity (latent variance >1) than male patients. Note. * = reference group, † = anchor item, a = discrimination parameter estimate, b = severity parameter estimate. Item parameters that significantly differed from the reference group are presented in the table; item parameters that did not significantly differ from the reference group or that were fixed as anchoring items are indicated with dashes (-). Latent means and variances were fixed to 0 and 1, respectively, for the reference group and were freely estimated for non-reference groups.

By race
There was no DIF by race although analyses may have been underpowered to detect any differences due to small numbers of some subgroups.

By ethnicity
One item (craving) had significant DIF by Hispanic ethnicity for the severity parameter.
In addition to DIF, Hispanic patients had, on average, higher (latent mean >0) and less variable (latent variance <1) SUD severity than not Hispanic patients. 1.00 0.74 Note. * = reference group, † = anchor item, a = discrimination parameter estimate, b = severity parameter estimate. Item parameters that significantly differed from the reference group are presented in the table; item parameters that did not significantly differ from the reference group or that were fixed as anchoring items are indicated with dashes (-). Latent means and variances were fixed to 0 and 1, respectively, for the reference group and were freely estimated for non-reference groups. Caption: Using freely estimated models that corrected for DIF, the total expected number of SUD criteria endorsed on the Substance Use Symptom Checklist (y-axis) was plotted as a function of latent SUD severity (x-axis) for each subgroup. The vertical distances between curves represent the difference in total expected scores between subgroups with the same latent SUD severity. This difference was small, as indicated by test characteristic curves that nearly overlap, and never diverged more than 2/3 of one criterion, indicating that DIF had minimal cumulative impact.

eAppendix 5. DIF Findings for Patients Who Reported Both Daily Cannabis and Other Drug Use
Among the subsample of patients who reported both daily cannabis use and other drug use, there was significant DIF associated with age (1 item) and sex (1 item), but not race or ethnicity.

By age
One item (tolerance) had DIF by age for the discrimination parameter, and three items (tolerance, quit/control, craving) had DIF by age for the severity parameter.
In addition to DIF, patients 18-24 had, on average, higher SUD severity (latent mean >0) and less variable (latent variance <1) than patients 25-44 while patients 45-64 and 65+ had, on average, lower SUD severity (latent mean <0) than patients 25-44. Patients 45-64 had more variable SUD severity (latent mean>1) while patients 65+ had less variable SUD severity (latent mean<1) than patients 25-44. Note. * = reference group, † = anchor item, a = discrimination parameter estimate, b = severity parameter estimate. Item parameters that significantly differed from the reference group are presented in the table; item parameters that did not significantly differ from the reference group or that were fixed as anchoring items are indicated with dashes (-). Latent means and variances were fixed to 0 and 1, respectively, for the reference group and were freely estimated for non-reference groups.

By age
One item (hazardous use) had significant DIF by sex for the severity parameter.
In addition to DIF, female patients had, on average, lower and more variable SUD severity (latent mean <0; latent variance >1) than male patients. Note. * = reference group, † = anchor item, a = discrimination parameter estimate, b = severity parameter estimate. Item parameters that significantly differed from the reference group are presented in the table; item parameters that did not significantly differ from the reference group or that were fixed as anchoring items are indicated with dashes (-). Latent means and variances were fixed to 0 and 1, respectively, for the reference group and were freely estimated for non-reference groups.

By Race and Ethnicity
There was no significant DIF by race or ethnicity. Findings may have been underpowered to detect differences due to small numbers in some demographic subgroups.
daily cannabis use and other drug use on routine screening March 2015-March 2020 (n=2,373) Caption: Using freely estimated models that corrected for DIF, the total expected number of SUD criteria endorsed on the Substance Use Symptom Checklist (y-axis) was plotted as a function of latent SUD severity (x-axis) for each subgroup. The vertical distances between curves represents the difference in total expected scores between subgroups with the same latent SUD severity. This difference was small, as indicated by test characteristic curves that nearly overlap, and never diverged more than 2/3 of one criterion, indicating that DIF had minimal cumulative impact.

eAppendix 6. Detailed Description of the Clinical Impact DIF Has on Estimated SUD Severity
Patients who reported daily cannabis use only For patients who reported daily cannabis use only, the clinical impact of DIF was minimal in all cases. Differences in expected DSM-5 criteria counts for persons from different demographic subgroups with the same latent SUD severity were small and never diverged more than half of one criterion (Supplemental Figure 3), suggesting that differential item functioning had minimal cumulative impact on total criterion counts. When SUD severity was held constant, differential item functioning was expected to produce differences in SUD criteria that never exceeded 0.42 criteria (out of 11 possible) for age (patients 65+ reporting more criteria), 0.09 for sex (female patients reporting more criteria) 0.20 criteria for race (NH/PI patients reporting more criteria). These maximum differences tended to occur at high levels of latent SUD severity (e.g., more than 6 criteria). At clinical decision-making thresholds for mild, moderate, and severe SUD, differential item functioning was expected to produce even smaller differences (Supplemental Table 9). Further, comparing models with versus without correction for DIF did not improve model fit (∆CFI<0.01; Supplemental Table 10). In other words, the gain in model fit by allowing parameters to be freely estimated for each demographic subgroup (versus constrained to be equal across demographic subgroups), was very small.

Patients who reported other drug use only
For patients who reported other drug use only, the clinical impact of DIF was also minimal. Differences in expected DSM-5 criteria counts for persons from different demographic subgroups with the same latent SUD severity never diverged by more than two thirds of one criterion (Supplemental Figure 4). When SUD severity was held constant, differential item functioning was expected to produce differences in SUD criteria that never exceeded 0.66 criteria (out of 11 possible) for age (patients 65+ reporting more criteria), 0.11 for sex (female patients reporting fewer criteria) 0.17 for ethnicity (Hispanic patients reporting fewer criteria). At clinical decision-making thresholds for mild, moderate, and severe SUD, the largest differences were expected at the severe threshold (Supplemental Table 9). Comparing models with versus without correction for DIF did not improve model fit (∆CFI<0.01; Supplemental Table 10).

Patients who reported both daily Cannabis and other drug use
For patients who reported both daily cannabis and other drug use, the clinical impact of DIF was again minimal. Differences in expected DSM-5 criteria counts for persons from different demographic subgroups with the same latent SUD severity never diverged by more than two thirds of one criterion (Supplemental Figure 5). When SUD severity was held constant, differential item functioning was expected to produce differences in SUD criteria that never exceeded 0.57 criteria (out of 11 possible) for age (patients 65+ reporting fewer criteria) and 0.17 for sex (female patients reporting fewer criteria). At clinical decision-making thresholds for mild, moderate, and severe SUD, differential item functioning was expected to produce even smaller differences (Supplemental Table 9). As with the prior two subsamples, comparing models with versus without correction for DIF did not improve model fit (∆CFI<0.01; Supplemental